Salary data for Iowa State University (ISU) employees is publicly available through the state government. CyChecks, an R package, was developed with the intent of examining and interpreting the university’s salary dataset. This package will allow users to compare pay between genders, positions, and departments, while displaying the results in a dynamic interface (shiny app). While this data is publicly available, it was also understood that salaries are sensitive, personal information and therefore an individual’s name that could easily be associated with their salary was anonymized. We hope Cychecks will provide useful fodder for discussing pay patterns within a publicly funded university.
The goal of CyChecks is to allow users easy access to the publicly available data concerning Iowa State University (the Cyclones) employee salaries. #### Some of the services this access could offer include:
Pay patterns and/or pay disparities have been an on-going topic of discussion and focus for research since [insert date]. The [insert law] aimed to dismantle these disparities but unfortunately the female gender is still under xx.
The state of Iowa offers a large amount of public data at this site. You can access the data by signing up for an API token here. Iowa state employee salaries are available at this site. Using an API token, CyChecks provides a function to easily get data for any given year. The data from this site does not include the employee’s home department. Unfortunately, after much exploring, the data linking names to departments is not easily accessible. Iowa State Univerisity’s Human Resources Department kindly provided a list of employees with their home departments and associated colleges valid as of January 1st 2019. Since acquiring this information is not reproducible to the average user, included is a full dataset (with names anonymized) of all salary info (from years 2007 to 2018) cross-referenced by department in the CyChecks package. It was discovered that employees that had left before the year 2019 will not listed under their respective department in this dataset. This isn’t ideal, but a reality of trying to display this sort of data.
Below you will see the structure of the data, which displays information about the size of the dataset (number of observations), the number of variables (categories), and the type of data that variable provides.
load(file = "C:/Users/lawto/Desktop/Spring2019/STAT585/finalproj/CyChecks/CyChecks/data/sals_dept_profs.rda")
#view(sals_dept_profs)
cychecks_dept_data <- sals_dept_profs
str(cychecks_dept_data)
#> 'data.frame': 16633 obs. of 12 variables:
#> $ base_salary_date : POSIXct, format: "2007-07-01" "2007-07-01" ...
#> $ fiscal_year : Factor w/ 12 levels "2007","2008",..: 1 1 1 1 1 1 1 1 1 1 ...
#> $ gender : Factor w/ 3 levels "M","F","*": 1 1 1 1 1 1 1 1 1 1 ...
#> $ place_of_residence : Factor w/ 963 levels "STORY","COOK",..: 1 1 1 1 7 1 1 1 1 1 ...
#> $ position : chr "professor" "professor" "professor" "professor" ...
#> $ total_salary_paid : num 82467 81893 102340 74169 120006 ...
#> $ travel_subsistence : num 4429 700 NA NA 6315 ...
#> $ base_salary : chr "84597" "71281" "94971" "74911" ...
#> $ organization : Factor w/ 18 levels "AMES LABORATORY OF US DOE",..: NA NA NA NA NA 7 7 7 3 2 ...
#> $ department : Factor w/ 233 levels "AMES LABORATORY",..: NA NA NA NA NA 86 97 99 36 2 ...
#> $ id : chr "94e35beb" "aa677a76" "ff1d3cf6" "b2540156" ...
#> $ position_simplified: chr "professor" "professor" "associate" "professor" ...head(cychecks_dept_data)
#> base_salary_date fiscal_year gender place_of_residence position
#> 1 2007-07-01 2007 M STORY professor
#> 2 2007-07-01 2007 M STORY professor
#> 3 2007-07-01 2007 M STORY professor
#> 4 2007-07-01 2007 M STORY professor
#> 5 2007-07-01 2007 M POLK professor
#> 6 2007-07-01 2007 M STORY professor
#> total_salary_paid travel_subsistence base_salary
#> 1 82467.0 4429.17 84597
#> 2 81893.4 700.00 71281
#> 3 102340.3 NA 94971
#> 4 74169.0 NA 74911
#> 5 120006.0 6315.17 127306
#> 6 80801.6 5791.86 72465
#> organization department id
#> 1 <NA> <NA> 94e35beb
#> 2 <NA> <NA> aa677a76
#> 3 <NA> <NA> ff1d3cf6
#> 4 <NA> <NA> b2540156
#> 5 <NA> <NA> 74eb0279
#> 6 COLLEGE OF LIBERAL ARTS & SCIENCES EEOB-LAS 9d2c3ad7
#> position_simplified
#> 1 professor
#> 2 professor
#> 3 associate
#> 4 professor
#> 5 professor
#> 6 associatenames(cychecks_dept_data)
#> [1] "base_salary_date" "fiscal_year" "gender"
#> [4] "place_of_residence" "position" "total_salary_paid"
#> [7] "travel_subsistence" "base_salary" "organization"
#> [10] "department" "id" "position_simplified"This package (CyChecks) is an R package that helps a user conveniently “check” various information concerning salary information for Iowa State University employees and faculty. CyChecks
CyChecks currently includes several functions that help the user navigate and explore this data more easliy, while also 1. A function to: 1. download data directly from the iowa.gov website 2. anonymize names of downloaded data 3. simplify professor and post-doc position titles 4. quickly identify departments with possible gender pay inequities 1. Data from: i. Employee/salary/department dataset for 2007-2018 ii. Employee/salary/department/college dataset assumed to be valid for 2018
-Get data from the web
#insert example of what function executes-convert each individual’s personal name to an alphanumeric id to mask the individual’s real identity
#insert example of what function executes-find the pay gap between different positions
#insert example of what function executesRun stats to find significance between male versus female with respect to total salary
#insert example of what function executesThe following graphs display some examples of the pay pattern across departments and over the years for different positons at Iowa State University.
#used packages
library(dplyr)
#> Warning: package 'dplyr' was built under R version 3.5.2
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(tidyverse)
#> -- Attaching packages -------------------------------------------------------------------------- tidyverse 1.2.1 --
#> v ggplot2 3.1.0 v readr 1.3.1
#> v tibble 2.0.1 v purrr 0.3.0
#> v tidyr 0.8.2 v stringr 1.3.1
#> v ggplot2 3.1.0 v forcats 0.3.0
#> Warning: package 'ggplot2' was built under R version 3.5.2
#> Warning: package 'tibble' was built under R version 3.5.2
#> Warning: package 'tidyr' was built under R version 3.5.2
#> Warning: package 'readr' was built under R version 3.5.2
#> Warning: package 'purrr' was built under R version 3.5.2
#> -- Conflicts ----------------------------------------------------------------------------- tidyverse_conflicts() --
#> x dplyr::filter() masks stats::filter()
#> x dplyr::lag() masks stats::lag()
library(ggplot2)
###Load data
load(file = "C:/Users/lawto/Desktop/Spring2019/STAT585/finalproj/CyChecks/CyChecks/data/sals_dept.rda")
#view(sals_dept)
cychecks_data <- sals_dept
#view(cychecks_data)
load(file = "C:/Users/lawto/Desktop/Spring2019/STAT585/finalproj/CyChecks/CyChecks/data/sals_dept_profs.rda")
#view(sals_dept_profs)
cychecks_dept_data <- sals_dept_profs
#view(cychecks_dept_data)
#clean data
#sals_dept <- sals_dept %>% filter(!is.na(gender), gender != "*")
# sals_dept_profs <- sals_dept_profs %>% filter(!is.na(gender), gender != "*")
# department <- c("All departments", sort(unique(as.character(sals_dept$department))))
# fiscal_year <- c("All years", sort(unique(as.character(sals_dept$fiscal_year))))
#plot data to analyze
gender_pay <- cychecks_data %>%
ggplot(aes(x = fiscal_year, y = total_salary_paid)) +
geom_point(aes(col=gender)) +
labs(subtitle = "Gender Vs Total Salary Paid Each Year",
y="Total Salary Paid",
x="Year")
plotly::ggplotly(gender_pay) #interactive map
gender_pay2 <- cychecks_data %>%
ggplot(aes(x = total_salary_paid)) +
geom_bar(aes(col=gender)) +
xlim(c(0, 2020)) +
ylim(c(0, 100)) +
labs(subtitle = "Gender Vs Total Salary Paid Each Year",
y="Total Salary Paid",
x="Year")
plotly::ggplotly(gender_pay2)
#> Warning: Removed 103737 rows containing non-finite values (stat_count).
#> Warning: position_stack requires non-overlapping x intervals#plot(gender_pay2)Follow the link